Skip to content

Conversation

@fei315412-cmyk
Copy link

One thread calls mpsc_pbuf_alloc to produce data, which invokes add_skip_item and steps into k_sem_take.

Another thread calls mpsc_pbuf_claim to consume data. In this condition, mpsc_pbuf_claim has only small remaining space and needs to call rd_idx_inc to reserve space, but there is still no data available.

The consumer should call k_sem_give to wake mpsc_pbuf_alloc again, so the producer can allocate space and continue producing data.

Without this wake-up, the producer thread may wait forever in k_sem_take, leading to a deadlock situation.

@github-actions
Copy link

Hello @fei315412-cmyk, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

@fei315412-cmyk
Copy link
Author

@nordic-krch @dcpleung this bug was found in production. please provide any suggestion , thanks .

@dcpleung
Copy link
Member

I am not familiar with mpsc_pbuf.

@nordic-krch
Copy link
Contributor

@fei315412-cmyk but after calling mpsc_pbuf_claim (which does not call k_sem_give) user shall eventually call mpsc_pbuf_free on that buffer and it calls k_sem_give so i don't see how that deadlock could occur. Can you add a test case that triggers that behavior?

@fei315412-cmyk
Copy link
Author

@fei315412-cmyk but after calling mpsc_pbuf_claim (which does not call k_sem_give) user shall eventually call mpsc_pbuf_free on that buffer and it calls k_sem_give so i don't see how that deadlock could occur. Can you add a test case that triggers that behavior?

can you tell me ? how can compile test case about mpsc.c and how to run ? give me example and i write test case and test @nordic-krch

@nordic-krch
Copy link
Contributor

for example

west build -p -b qemu_x86 tests/lib/mpsc_pbuf/ -T libraries.mpsc_pbuf.concurrent
west build -t run

@zephyrbot zephyrbot added the area: Tests Issues related to a particular existing or missing test label Oct 29, 2025
@zephyrbot zephyrbot requested a review from nashif October 29, 2025 08:05
@fei315412-cmyk
Copy link
Author

fei315412-cmyk commented Oct 29, 2025

@nordic-krch
PR submitted. Please review when you have time.
waiting for you new message.
if not this commit, mpsc_pbuf_alloc will block by k_sem_take , need this commit to wakeup

block backtraces as fallows:
Thread 28 (Thread 0xf5fffb40 (LWP 30265) "test_sema_lock"):
#0 0xf7eeadf9 in __kernel_vsyscall ()
#1 0xf7ca8c42 in __libc_do_syscall () at ../sysdeps/unix/sysv/linux/i386/libc-do-syscall.S:39
#2 0xf7c19ee3 in __futex_abstimed_wait_common32 (private=, cancel=, abstime=, op=, expected=, futex_word=) at ./nptl/futex-internal.c:40
#3 __futex_abstimed_wait_common (futex_word=0x8ac58e8, expected=1, clockid=, abstime=0x0, private=0, cancel=true) at ./nptl/futex-internal.c:99
#4 0xf7c1a0df in __GI___futex_abstimed_wait_cancelable64 (futex_word=, expected=, clockid=, abstime=0x0, private=0) at ./nptl/futex-internal .c:139
#5 0xf7c267e2 in do_futex_wait (sem=sem@entry=0x8ac58e8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:116
#6 0xf7c2688b in __new_sem_wait_slow64 (sem=0x8ac58e8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:284
#7 0x08056be7 in nct_sem_rewait (semaphore=0x8ac58e8) at /workdir/zephyr/zephyr-git-commit/scripts/native_simulator//common/src/nct.c:149
#8 nct_wait_until_allowed (tt_el=0x8ac58e0, this_th_nbr=25) at /workdir/zephyr/zephyr-git-commit/scripts/native_simulator//common/src/nct.c:179
#9 0x08056cff in nct_swap_threads (this_arg=0x8ac5410, next_allowed_thread_nbr=26) at /workdir/zephyr/zephyr-git-commit/scripts/native_simulator//common/src/nct.c:241
#10 0x080505ce in posix_swap (next_allowed_thread_nbr=, this_th_nbr=) at /workdir/zephyr/zephyr-git-commit/arch/posix/core/posix_core_nsi.c:38
#11 0x080503ea in arch_swap (key=0) at /workdir/zephyr/zephyr-git-commit/arch/posix/core/swap.c:64
#12 0x080530fe in z_swap_irqlock (key=) at /workdir/zephyr/zephyr-git-commit/kernel/include/kswap.h:216
#13 0x080529e5 in z_impl_k_sem_take (sem=0x806194c <mpsc_buffer+44>, timeout=...) at /workdir/zephyr/zephyr-git-commit/kernel/sem.c:158
#14 0x0804ea77 in k_sem_take (timeout=..., sem=0x806194c <mpsc_buffer+44>) at /workdir/zephyr/zephyr-git-commit/build/zephyr/include/generated/zephyr/syscalls/kernel.h:1158
#15 mpsc_pbuf_alloc (buffer=0x8061920 <mpsc_buffer>, wlen=452, timeout=...) at /workdir/zephyr/zephyr-git-commit/lib/os/mpsc_pbuf.c:385
#16 0x0804afd7 in log_buffer_test_sema_lock () at /workdir/zephyr/zephyr-git-commit/tests/lib/mpsc_pbuf/src/main.c:1149
#17 0x08050b5b in run_test_functions (suite=0x8061380 <z_ztest_test_node_log_buffer>, data=0x0, test=0x806155c <z_ztest_unit_test.log_buffer.test_sema_lock>) at /workdir/zephyr/zephyr-gi t-commit/subsys/testsuite/ztest/src/ztest.c:328

@fei315412-cmyk
Copy link
Author

test case result:
image

if not commit:
image
backtrace:
image

@fei315412-cmyk fei315412-cmyk force-pushed the main branch 3 times, most recently from ba8e29e to 9e95451 Compare October 29, 2025 12:53
@fei315412-cmyk
Copy link
Author

@nordic-krch @nashif @dcpleung please provide any suggestion again , thanks .
i have also commit test case trigger this bug and commit image for this backtrace.

One thread calls mpsc_pbuf_alloc to produce data, which invokes
add_skip_item and steps into k_sem_take.

Another thread calls mpsc_pbuf_claim to consume data. In this condition,
mpsc_pbuf_claim has only small remaining space and needs to call rd_idx_inc
to reserve space, but there is still no data available.

The consumer should call k_sem_give to wake mpsc_pbuf_alloc again,
so the producer can allocate space and continue producing data.

Without this wake-up, the producer thread may wait forever in
k_sem_take, leading to a deadlock situation.

Signed-off-by: Fei Wang <[email protected]>
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Logging area: Tests Issues related to a particular existing or missing test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants